Psychometric evaluation of the Adherence to Refills and Medications Scale (ARMS) in Australians living with gout

This study aimed to examine psychometric properties of the Adherence to Refills and Medications Scale (ARMS) in people with gout. We conducted exploratory factor analysis (EFA) and tested internal consistency (ordinal and Cronbach’s alpha coefficients) and agreement (intraclass correlation coefficient (2,1)) in ARMS scores across three timepoints (baseline, 6, and 12 months) in 487 people with gout. The Kruskal–Wallis test, Spearman’s rank, Kendall’s tau-b correlations, and logistic regression were used to examine the criterion-related validity of the ARMS and factors associated with the ARMS. EFA suggested a one-factor structure, explaining 43.2% of total variance. High internal consistency (ordinal alpha = 0.902 at baseline) and moderate agreement in ARMS scores over time (ICCs > 0.5; p < 0.001) were observed. Lower ARMS scores (indicating better adherence) predicted achieving target serum urate (OR, 0.89; 95% CI, 0.83–0.95; p < 0.001), but not urate-lowering therapy (ULT) adherence (Proportion of Days Covered (PDC) ≥ 80%) (OR, 0.93; 95% CI, 0.81–1.05; p = 0.261). Negative correlations between ARMS and PDC were not statistically significant (Kendall’s tau-b, r =  − 0.126, p = 0.078; Spearman’s rho =  − 0.173, p < 0.073). Differences in median ARMS scores (IQR) of 16 (14–20), 13 (12–15), and 17.5 (15–21) in three groups of participants who reported (1) not taking ULT, (2) taking ULT and adherent, and (3) taking ULT but not adherent, respectively, were statistically significant (p < 0.001). Age was the only patient factor independently associated with optimal adherence (ARMS score = 12) (OR, 1.91; 95% CI, 1.50–2.43; p < 0.001). The ARMS is a reliable and valid measure of medication adherence behaviours in people with gout, justifying its use in gout medication adherence research. Key Points • Valid, practical, and efficient methods of measuring adherence to medications are needed in people with gout. • Commonly used medication adherence questionnaires have limited validity or have not been validated in people with gout. • The Adherence to Refills and Medications Scale (ARMS) has been proven valid and practical in many chronic illnesses but has not been validated in people with gout. • We showed the ARMS is valid and reliable for use in people with gout. Supplementary Information The online version contains supplementary material available at 10.1007/s10067-024-07050-y.


Introduction
Gout is the most common form of inflammatory arthritis in men [1].Gout is caused by elevated serum urate (SU) concentrations leading to urate precipitation in joints, triggering recurrent painful gout flares [2].Untreated, it can cause joint damage, disability, reduced quality of life, and increased risk of cardiovascular disease and mortality [3,4].Successful gout management requires the lifelong use of urate-lowering therapies (ULT).Lowering SU concentrations below the saturation point of urate (< 0.36 mmol/L) promotes dissolution within the affected joints and reduces the risk of urate precipitation [5].ULT removes the causative agent of flares and thus essentially 'cures' a person from the disease [5].However, despite the availability of safe, generally well-tolerated, and effective ULT for gout [6], adherence to ULT remains sub-optimal [7].
Given the importance of ULT in gout management, tools are required to identify patients with gout at the greatest risk of non-adherence.Subjective measures of adherence, such as questionnaires, can help identify reasons for non-adherence, are relatively simple to use, and are less expensive than more objective measures [8].However, the Medication Adherence Report Scale (MARS-10) [9] and the Morisky Medication Adherence Scale (MMAS-8) [10] that have been utilised to examine medication adherence in gout have either limited validity or have not been validated specifically for their use in gout.
The Adherence to Refills and Medications Scale (ARMS) is a valid and reliable medication adherence scale designed for use in chronic disease populations and patients with low literacy [11].It was highly correlated with a commonly used self-reported adherence measure (Medication Adherence Questionnaire) [12] and a measure of refill adherence (cumulative medication gap) [11].To date, the ARMS has been validated for use across numerous languages and chronic diseases, including diabetes, coronary heart disease, and breast cancer, but not in gout [13][14][15][16][17]. ARMS notably assessed patients' ability to take and refill their prescribed medications [11].ARMS was designed for use in patients with low literacy, and simple wording was used [11].Previous studies reported that the ARMS exhibited a high internal consistency (Cronbach's alpha ranged from 0.74 to 0.954) [11,13,[15][16][17][18][19][20].Regarding criterion-related validity, the ARMS was associated with a self-reported adherence measure developed by Morisky and colleagues [11], medication refill adherence during the previous 6 months [11], glycaemic control [13,18], and blood pressure control [15,16].
We explored the psychometric properties of the ARMS in people living with gout.Specifically, we aimed to examine whether the ARMS's two-factor structure (i.e. the medication-taking behaviour and refill behaviour subscales) applies to people with gout.We also examined the internal consistency and agreement in ARMS scores across three timepoints (baseline, 6, and 12 months).The value of the ARMS as a predictor of adherence in general and ULT adherence specifically, as well as achievement of target serum urate (SU), was also examined.

Study design and participants
This study utilised data from the Gout APP (GAPP) trial [21], a 12-month randomised-controlled trial (RCT) examining the effectiveness of a mobile app designed to help people self-manage their gout and achieve the target SU concentration (≤ 0.36 mmol/L) by adhering to ULT.Participants were randomised to receive either an intervention or control app to use for 12 months.Participants completed surveys and blood tests for SU at baseline, 6, and 12 months.

Inclusion and exclusion criteria
Eligible participants were over 18 years of age, residents of Australia, diagnosed with gout, reported having at least one gout flare in the preceding 12 months, were receiving or eligible to start/restart ULT, and had access to a smartphone or tablet device with internet access.Exclusions included insufficient technological skills to use the mobile app, limited understanding of English, or a psychological condition precluding participation.

Ethical considerations
The GAPP trial was approved by the University of New South Wales's Human Research Ethics Committee (HC15199 and HC210543).All participants provided written informed consent.

The adherence to refills and medications scale
The ARMS is a 12-item ordinal scale developed to measure general medication adherence behaviour [11].There are two subscales: eight items comprising the 'medicationtaking' subscale which measures a person's ability to take prescribed medications as directed and four items comprising the prescription-refill subscale that evaluates a person's adherence acquiring prescription refills.Each item is rated on a four-point Likert scale: 1 (none of the time), 2 (some of the time), 3 (most of the time), and 4 (all of the time).Responses are summed to produce an overall adherence score ranging from 12 to 48, with a higher score indicating poorer general medication adherence.

Self-reported ULT-taking status and daily doses of ULT
The GAPP survey asked participants to report whether they were taking any medication for their gout at the time of the survey and the name, strength, daily dose, and frequency (e.g.once a day) of gout medications.The survey also asked a multi-response question: 'If there are times that you do not take your medications for your gout, what is/are the reason(s) for this?Please select all that apply.If a participant who reported taking ULT, selected 'I always take my medications' as the only one of the 5 options to the multi-response question, this was considered 'self-reported adherence to ULT'.The self-reported ULT-taking status was categorised as follows: (1) 'not taking ULT', (2) 'taking ULT and adherent', and (3) 'taking ULT but not adherent'.

Estimating ULT adherence
Adherence to ULT was estimated [22] from the Pharmaceutical Benefits Scheme (PBS) dispensing claims data using Proportion of Days Covered (PDC) and patient-reported daily doses [23].Participants in the GAPP trial optionally consented for the release of their PBS claims data covering a 3-year period.PBS data was supplied by Services Australia and included information on all dispensings for each participant in this 3-year period with the dates of dispensings, formulation strength, and number of tablets supplied.Briefly, the baseline PDC was estimated in the approximately 12-month period before the link to the app was sent to participants.The PDC was calculated in participants who reported taking, and were dispensed, any ULT drug (allopurinol or febuxostat) using the formula: Participants with PDC values ≥ 80% were considered adherent [24].More information on the methodology used to estimate the PDC can be found in Online Resource 1, available at Rheumatology online.

Statistical analyses
Data were analysed using SPSS statistical software (IBM SPSS Statistics 27.0, Chicago, IL), with a level of significance set at 0.05 and all confidence intervals (CIs) set at 95%.Ordinal alpha coefficients were computed using SAS software (version 9.4, the SAS System for Windows, SAS Institute Inc., Cary, NC, USA).Descriptive analyses of patient characteristics and ARMS item scores and PDC = Total days covered given the estimated daily dose Total number of days from first to last ULT dispensing × 100.
exploratory analyses to evaluate missing data and examine score distribution and item response patterns (floor or ceiling effects) were conducted.We compared baseline characteristics between participants who completed the ARMS at all three timepoints and those who did not.Baseline differences were evaluated using appropriate statistical tests, including Pearson's chi-square test, independent samples t-test, or its non-parametric equivalent where applicable.

Construct validity
As the ARMS scale was ordinal and its distribution skewed, polychoric correlation matrices were computed to evaluate the scale's construct validity [25].A free and comprehensive POLYMAT-C program for SPSS was used for the computation of polychoric correlations [26].Polychoric correlation coefficients were chosen in preference to Pearson's correlation coefficients as the polychoric correlation method has been demonstrated to produce unbiased parameter estimates for exploratory factor analysis (EFA) and more accurate estimations of dimensionality than other methods using ordinal variables [25,27].
To ensure the suitability of the item pool for factor analysis, Kaiser-Meyer-Olkin's test and Bartlett's test of sphericity were computed [28].The appropriateness of inter-item correlation matrices was explored by visual examination to ensure that coefficients were above 0.30 but below 0.90 [29].To determine the number of factors to be retained in a subsequent EFA, as recommended [28], a combination of parallel analysis (PA) and Velicer's minimum average partial (MAP) test was utilised, with scree plots reserved as a useful addition to make the final decision.EFA was performed to test the scale's structure.
We used multiple imputation (MI) to address missing ARMS data.We imputed 25 datasets assuming either missing at random (MAR) or missing not at random (MNAR) scenarios using the mice (Multivariate Imputation by Chained Equations) package in R (version 4.4.0).The predictive mean matching method was used for MAR [30].The pattern-mixture model method was used for MNAR [31,32] to allow for the assumption that participants with missing data at follow-up who had sub-optimal adherence at baseline (ARMS score > 12) also had sub-optimal adherence at follow-up.Imputed item responses for these participants were adjusted by adding + 0.25 at each follow-up imputed item response value [31,32].If the imputed item response score was 4 (the highest item score possible), it was left as 4 in the dataset.Next, a polychoric correlation matrix was computed for each imputed dataset and an EFA was performed forcing a one-factor structure.Finally, the package mifa (Multiple Imputation for Factor Analysis) was used to calculate the proportion of variance accounted for by the one factor [30], and the factor loadings were extracted to examine the minimum, median, and maximum values for the 25 datasets.

Internal consistency, agreement, and responsiveness
Internal consistency reliability of the ARMS was assessed using ordinal alpha and Cronbach's alpha coefficients at baseline, 6, and 12 months.A coefficient ≥ 0.70 was considered acceptable [33].The developers of the ARMS found it to have an acceptable internal consistency according to Cronbach's alpha coefficient [11], the use of which in ordinal scales, however, was disputed [34].As such, the ordinal alpha was calculated [35].To assess the agreement of the scale over time, we examined intraclass correlation coefficients (ICC, 2,1) between the baseline and each of the two follow-up timepoints: 6 and 12 months.The ICC form of '2,1' was chosen as it measures absolute agreement using two-way random effects and assumes a single measurement at each timepoint for each participant [36].The ability of ARMS to detect change in medication adherence over time (responsiveness) was assessed in all participants from baseline to 12 months using a Wilcoxon Signed-Ranks test.

Criterion-related validity and predictors of optimal medication adherence
After finding an optimal scale structure, the criterion-related validity of the scale was examined using data from all participants who completed ARMS at baseline.ARMS scores were dichotomised at the lowest score of 12, considering optimal medication adherence behaviour.We separately examined the association of baseline ARMS scores and optimal adherence (ARMS score = 12), respectively, with baseline target SU (≤ 0.36 mmol/L) using multivariable logistic regression adjusting for age and sex.These analyses were repeated with claims-data-derived ULT adherence (PDC ≥ 80%) at baseline as the outcome variable.
A Kruskal-Wallis test was performed to examine whether there was a difference in ARMS scores between the three categories of self-reported ULT-taking status.
Lastly, a forward stepwise binary logistic regression procedure was used to identify predictors of optimal adherence (ARMS score = 12).A criterion of p < 0.1 was required for variables to be evaluated in the multivariable model, but to be retained in the final model, the criterion required was p < 0.05.
Additionally, we assessed the criterion-related validity of the ARMS and examined factors associated with optimal adherence at baseline for those who completed the ARMS at all three timepoints.
The mean SU concentration (SD) of the cohort was 0.43 (0.14) mmol/L, with 25% at or below the recommended target SU (≤ 0.36 mmol/L) at baseline.Almost two-thirds (63%) reported taking a ULT at the time of the baseline survey, and 182 (37%) reported not taking ULT.Of those who reported taking ULT, 144 (30%) reported they were adherent and 161 (33%) reported they were not adherent.

Construct validity
Inspection of the inter-item polychoric correlation matrix at each timepoint revealed that most items had some correlation with each other, and multicollinearity was not evident as correlations did not exceed 0.8.The Kaiser-Meyer-Olkin measure of sampling adequacy was 0.89 and Bartlett's test of sphericity (BT 2875.041) was statistically significant (p < 0.0001).These results indicated that the data were suitable for conducting an EFA.
Results from parallel analysis (PA) and minimum average partial (MAP) tests in addition to factor scree plots at each timepoint demonstrated that a one-factor structure would be optimal.Thus, EFA was performed using principal axis factoring (PAF) with no rotation and forcing a one-factor structure.As all items had a factor loading above 0.5 (Table 2), it was decided to retain all items.One factor accounted for 43.2% of the variance and had an eigenvalue of 5.188.A two-or three-factor solution did not result in a clear separation of the items.
EFA performed on 25 imputed datasets showed that median factor loadings exceeded 0.3 at all timepoints, and the aggregated mean estimates for variance in item responses explained by the one factor were above 30% under either missingness mechanism (MAR or MNAR) (Supplementary Tables S2 and S3).No evidence suggested that a one-factor solution was not optimal when considering the missing data mechanism as either MAR or MNAR.

Internal consistency, agreement, and responsiveness
Ordinal alpha coefficients were 0.902 at baseline, 0.903 at 6 months, and 0.907 at 12 months, indicating high internal consistency.In comparison, Cronbach's alpha coefficients were 0.790 at baseline, 0.773 at 6 months, and 0.773 at 12 months, indicating acceptable internal consistency.The intraclass correlation coefficient (ICC, 2,1) indicated an overall moderate agreement in ARMS scores across timepoints (ICC, baseline to 6 months = 0.518, p < 0.001; ICC, baseline to 12 months = 0.573, p < 0.001).A Wilcoxon Signed-Ranks test indicated that the ARMS score (mean rank = 123.56)at 12 months was significantly lower than the ARMS score at baseline (mean rank = 136.72),with a Z-score of − 3.584 and a p-value of < 0.001.
Similar results were found when repeating these analyses in the 311 participants who completed the ARMS at all timepoints.Participants with optimal adherence were more than twice as likely to achieve target SU than those with sub-optimal adherence (OR adjusted for age and sex, 2.43; 95% CI, 1.31-4.53;p < 0.05).Using a Kruskal-Wallis test, the lowest median ARMS score (IQR) (13 (12-14)) was also observed in the 'taking ULT and adherent' group, while the highest (17 (15-21)) was observed in the 'taking ULT but not adherent' group (p < 0.001).

ULT adherence
In 108 participants who had two or more ULT dispensings in the 12-month period before the app link was sent to them, a PDC had been calculated using self-reported dosing data from the baseline survey along with the PBS claims data of dispensings [22].These 108 people comprised 101 taking allopurinol and 7 taking febuxostat.The mean (SD) and median (IQR) PDCs were 83% (21%) and 92% (70-100%), respectively, with 70 (64.8%)categorised as being adherent to ULT (PDC ≥ 80%).

Discussion
ULT is key to reducing urate concentrations in people with gout, thereby avoiding further gout flares; however, adherence to ULT is notably poor.Recent systematic reviews indicate that adherence to ULT assessed from prescription refills recorded in claims databases is as low as 10% and as high as 47% [37,38].This is generally lower than other chronic diseases including hypertension (53-71%) [39], diabetes mellitus (20-88%) [40], and rheumatoid arthritis (9.3-94%) [41].In primary care, studies have found that between 25 and 73% of patients with gout were prescribed ULT, while only between 41 and 70% achieved the target SU concentration [42][43][44][45][46]. Factors contributing to poor ULT adherence have been identified.Gout is stigmatised as a disease of opulence and widely believed to be controllable with diet and lifestyle modifications which negatively impacts adherence to medications [47].People with gout may stop ULT due to the long periods without flares [48].Further, the risk of acute gout flares with ULT initiation induces poor adherence, which emphasises the importance of effective education of people commencing ULT and provision of flare prophylaxis upon ULT initiation [7,49].For successful treatment with ULT, patients and health professionals need to understand the importance of monitoring SU concentrations to titrate to an appropriate ULT dose and maintain SU concentrations below the target level [46,50,51].As demonstrated from our study on the association between the ARMS and achievement of target SU, the ARMS may be a valuable tool in identifying patients at greatest risk of not reaching target SU.The ARMS has been validated in people with at least one chronic disease [11,17,19,20,52], hypertension [15,16], and type 2 diabetes mellitus [13,18].Consistent with this, our findings confirm that the ARMS is a valid and reliable measure of the ability to correctly self-administer and refill prescriptions of prescribed medications in people with gout.Interestingly, the ARMS appeared to measure these two medication adherence behaviours as a single construct.
In the validation study of the original version of ARMS [11], the two-factor analysis resulted in factor 1 accounting for 35.1% and factor 2, 10% of variance.When combined, almost as much variance was explained (45.1%) as our study's one factor (43.2%).A two-factor solution was also reported in six studies validating the ARMS in other languages and an adapted version for measuring adherence to diabetes medicines [15][16][17][18][19][20]52].On the other hand, the validation of the Korean version in adults with type 2 diabetes [13] demonstrated that 3 dimensions explained 54.7% of the total variance.Our results indicate a one-factor structure was optimal, and a two-or three-factor solution did not result in clear item separation.Our findings may be explained by the characteristics of our cohort, our choice of conducting factor analysis using polychoric inter-item correlations, using a parallel analysis (PA) criterion [53], and undertaking a minimum average partial (MAP) test [28].Summated rating scales with Likert-type response items, such as the ARMS, produce scores that are ordinal in nature and often highly skewed.When such scales are used, it has been recommended that factor analyses should be conducted on the matrix of polychoric inter-item correlations instead of Pearson correlations for more accurate models [25].Interestingly, only one [20] of the nine previous validation studies [11,13,[15][16][17][18][19]52] used either the PA or MAP methods to decide how many factors to extract in the factor analysis.
All ordinal alpha coefficients were > 0.9, indicating high internal consistency [33].This finding is consistent with the validation study of the original version (Cronbach's alpha = 0.814) [11] and validation studies in other populations [13,[15][16][17][18][19][20]52].Since a Pearson covariance matrix is routinely used in the calculation of Cronbach's alpha which assumes that the data is continuous [54], it may be best to report alternatives to Cronbach's alpha in examining the internal consistency of the ARMS.Unlike the previous studies mentioned, we observed a moderate agreement in ARMS scores over time [36], and a decrease in ARMS scores between baseline and 12 months.This was expected as the intervention, or even participation, in the GAPP trial might have altered the ARMS scores in participants over time.
Previous research has associated the ARMS with the MMAS-8, blood pressure control [15,16], and glycaemic control [11,13,18,19].In our study, participants were almost twice as likely to have achieved target SU if they had optimal adherence (ARMS score = 12) than those with sub-optimal adherence.Our study also demonstrated the association of ARMS with self-reported ULT-taking status as the median ARMS score was near-optimal in participants who reported taking ULT and being adherent to ULT.These participants were more likely to have a better medication adherence behaviour (lower ARMS scores) compared to participants who did not report taking ULT or participants who reported taking ULT but not being adherent to ULT.
One source of weakness in this study which we believe overall decreased the ARMS scores in participants over time was the number of participants lost to follow-up.Better medication adherence behaviours (lower ARMS scores) among participants who completed the ARMS at all timepoints (compared to those who did not) may partly be attributed to differences in patient characteristics between these groups.We found that retirees were more likely to complete the survey at all timepoints while older participants reported better medication adherence.However, the factor structure and loadings remained consistent across three timepoints despite attrition at 6-month and 12-month timepoints, and the results from MI analyses indicated that our EFA results were not significantly influenced by either MAR or MNAR mechanisms.In fact, using imputed datasets under MAR and MNAR assumptions yielded factor loadings and item response variances consistent with those obtained from the factor analysis performed on the complete sample.
An advantage of this study is the availability of ARMS data at three timepoints to examine the factor structure and internal consistency of the scale.The results consistently identified the one-factor structure, and calculation of ordinal alpha using polychoric correlations showed high internal consistency at each timepoint.Another strength is the use of an objective clinical measure (SU) for validation, as well as additional adherence measures.Limitations of the study included missing data at follow-up timepoints and the GAPP trial's selection criteria excluding those without access to a mobile phone (or tablet device) or the Internet, and those with limited English language proficiency.

Conclusion
The ARMS is a well-validated tool for use in populations with low literacy and in numerous chronic diseases.Our psychometric findings indicate that the ARMS is a valid and reliable tool for measuring the ability to self-administer and refill prescribed medications in people with gout.The 12-item ARMS was able to discriminate between groups with different levels of serum urate concentrations, as well as groups with different self-reported ULT-taking status.Thus, given the low rates of ULT adherence and the central role of ULT in managing gout, the ARMS may be a useful tool for identifying opportunities to improve gout management in people with gout.

Table 1
Sociodemographic and clinical characteristics of participants at baseline (n = 487)

Table 3 )
. BMI body mass index, GP general practitioner, IQR interquartile range, SD standard deviation, SU serum urate, ULT urate-lowering therapy, PDC Proportion of Days Covered All results are presented in n (%) unless otherwise indicated.Patients who responded either 'I do not know' or 'I would rather not respond' were treated as missing values

Table 2
Factor loadings from principal axis factoring with no rotation and forcing a one-factor structure at each timepoint

Table 5
Patient factors associated with optimal adherence (ARMS score = 12) at baseline ARMS Adherence to Refills and Medications Scale, CI confidence interval, GP general practitioner, OR odds ratio a Variables included in the multivariable model were 'BMI (body mass index)', 'White/Caucasian/European as the sole ancestry', 'Age (for every ten years)', 'Employment status', 'Number of comorbidities', 'Bingedrinking behaviour', 'Living with co-dependents', and 'Seen GP for gout in last 6 months'.Only 'Age' remained in the final model